Prediction of protein structure classes by incorporating different protein descriptors into general Chou's pseudo amino acid composition

J Theor Biol. 2014 Nov 7:360:109-116. doi: 10.1016/j.jtbi.2014.07.003. Epub 2014 Jul 12.

Abstract

Successful protein structure identification enables researchers to estimate the biological functions of proteins, yet it remains a challenging problem. The most common method for determining an unknown protein's structural class is to perform expensive and time-consuming manual experiments. Because of the availability of amino acid sequences generated in the post-genomic age, it is possible to predict an unknown protein's structural class using machine learning methods given a protein's amino-acid sequence and/or its secondary structural elements. Following recent research in this area, we propose a new machine learning system that is based on combining several protein descriptors extracted from different protein representations, such as position specific scoring matrix (PSSM), the amino-acid sequence, and secondary structural sequences. The prediction engine of our system is operated by an ensemble of support vector machines (SVMs), where each SVM is trained on a different descriptor. The results of each SVM are combined by sum rule. Our final ensemble produces a success rate that is substantially better than previously reported results on three well-established datasets. The MATLAB code and datasets used in our experiments are freely available for future comparison at http://www.dei.unipd.it/node/2357.

Keywords: Ensemble of classifiers; Machine learning; Protein descriptors; Protein structure class; Support vector machines.

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Artificial Intelligence
  • Models, Genetic*
  • Protein Conformation*
  • Proteins / classification*
  • Proteins / genetics*
  • Software*
  • Support Vector Machine

Substances

  • Proteins